Accurate cross-architecture performance modeling for sparse matrix-vector multiplication (SpMV) on GPUs

نویسندگان

  • Ping Guo
  • Liqiang Wang
چکیده

This paper presents an integrated analytical and profile-based cross-architecture performance modeling tool to specifically provide inter-architecture performance prediction for Sparse Matrix-Vector Multiplication (SpMV) on NVIDIA GPU architectures. To design and construct the tool, we investigate the interarchitecture relative performance for multiple SpMV kernels. For a sparse matrix, based on its SpMV kernel performance measured on a reference architecture, our cross-architecture performance modeling tool can accurately predict its SpMV kernel performance on a target architecture. The prediction results can effectively assist researchers in making choice of an appropriate architecture that best fits their needs from a wide range of available computing architectures. We evaluate our tool with 14 widely-used sparse matrices on four GPU architectures: NVIDIA Tesla C2050, Tesla M2090, Tesla K20m, and GeForce GTX 295. In our experiments, Tesla C2050 works as the reference architecture, the other three are used as the target architectures. For Tesla M2090, the average performance differences between the predicted and measured SpMV kernel execution times for CSR, ELL, COO, and HYB SpMV kernels are 3.1%, 5.1%, 1.6%, and 5.6%, respectively. For Tesla K20m, they are 6.9%, 5.9%, 4.0%, and 6.6% on the average, respectively. For GeForce GTX 295, they are 5.9%, 5.8%, 3.8%, and 5.9% on the average, respectively. Copyright c ⃝ 2013 John Wiley & Sons, Ltd.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Investigating the Effects of Hardware Parameters on Power Consumptions in SPMV Algorithms on Graphics Processing Units (GPUs)

Although Sparse matrix-vector multiplication (SPMVs) algorithms are simple, they include important parts of Linear Algebra algorithms in Mathematics and Physics areas. As these algorithms can be run in parallel, Graphics Processing Units (GPUs) has been considered as one of the best candidates to run these algorithms. In the recent years, power consumption has been considered as one of the metr...

متن کامل

Yet another Hybrid Strategy for Auto-tuning SpMV on GPUs

Sparse matrix-vector multiplication (SpMV) is a key linear algebra algorithm and is widely used in many application domains. Besides multi-core architecture, there is also extensive research focusing on accelerating SpMV on many-core Graphics Processing Units (GPUs). SpMV computations have many indirect and irregular memory accesses, and load imbalance could occur while mapping computations ont...

متن کامل

Implementing Sparse Matrix-Vector Multiplication with QCSR on GPU

We are going through the computation from single core to multicore architecture in parallel programming. Graphics Processor Units (GPUs) have recently emerged as outstanding platforms for data parallel applications with regular data access patterns. However, it is still challenging to optimize computations with irregular data access patterns like sparse matrix-vector multiplication (SPMV). SPMV...

متن کامل

Optimization of sparse matrix–vector multiplication using reordering techniques on GPUs

It is well-known that reordering techniques applied to sparse matrices are common strategies to improve the performance of sparse matrix operations, and particularly, the sparse matrix vector multiplication (SpMV) on CPUs. In this paper, we have evaluated some of the most successful reordering techniques on two different GPUs. In addition, in our study a number of sparse matrix storage formats ...

متن کامل

Accelerating Sparse Matrix Vector Multiplication on Many-Core GPUs

Many-core GPUs provide high computing ability and substantial bandwidth; however, optimizing irregular applications like SpMV on GPUs becomes a difficult but meaningful task. In this paper, we propose a novel method to improve the performance of SpMV on GPUs. A new storage format called HYB-R is proposed to exploit GPU architecture more efficiently. The COO portion of the matrix is partitioned ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Concurrency and Computation: Practice and Experience

دوره 27  شماره 

صفحات  -

تاریخ انتشار 2015